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Abstract 

Genome-wide association studies (GWAS) are widely used to discover genetic variants associated with dis¬ 
eases. To control false positives, all findings from GWAS need to he verified with additional evidences, even 
for associations discovered from a high power study. Replication study is a common verification method by 
using independent samples. An association is regarded as true positive with a high confidence when it can 
be identified in both primary study and replication study. Gurrently, there is no systematic study on the 
behavior of positives in the replication study when the positive results of primary study are considered as the 
prior information. 

In this paper, two probabilistic measures named Reproducibility Rate (RR) and False Irreproducibility 
Rate (FIR) are proposed to quantitatively describe the behavior of primary positive associations (i.e. pos¬ 
itive associations identified in the primary study) in the replication study. RR is a conditional probability 
measuring how likely a primary positive association will also be positive in the replication study. This can be 
used to guide the design of replication study, and to check the consistency between the results of primary study 
and those of replication study. FIR, on the contrary, measures how likely a primary positive association may 
still be a true positive even when it is negative in the replication study. This can be used to generate a list 
of potentially true associations in the irreproducible findings for further scrutiny. The estimation methods 
of these two measures are given. Simulation results and real experiments show that our estimation methods 
have high accuracy and good prediction performance. 


1 Introduction 


Genome-wide association studies (GWAS) were designed to detect genetic variations associated with diseases 
by genotyping single nucleotide polymorphisms (SNPs) in different individuals (Hirschhorn and Daly 20051. 
Gompared to traditional candidate gene studies (Tabor et al., 20021 which are based on pathway information, 


GWAS avoid the selection bias by genotyping a dense set of SNPs across the whole genome. Also, GWAS 
is more powerful than linkage analysis in detecting genetic variants contributing to disease risk with modest 


effect size (Risch and Merikangas 1996). 


Since the first GWAS study on age-related macular degeneration (AMD) (Klein et al. 2005), there have 


been about 2000 GWAS reports so far with 1 4609 associatio ns showing genome wide significance (p-value 
5 X 10“®) for 756 different diseases/traits (Hindorff et al. accessed [2015.05.28]). Taking advantage of 
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The basic analysis method used in GWAS is hypothesis testing (Balding 2006). In order to reduce 
false positives, extra evidences are essential to verify the discoveries. Commonly, there are two strategies 
used in GWAS to discover and examine associations: joint analysis and replication based analysis. Joint 
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analysis uses all available GWAS data for the same disease in the same population to find associated SNPs, 
either by pooling multiple stage genotyping data or by combining test statistics with meta-analysis method. 
Afterwards, extra biological experiments are conducted to verify the associations. Replication based analysis 
splits the data into two parts, one for discovery (commonly called primary study) and the other for validation 
(commonly called replication study). With only a subset of available data being used in the primary study, 
replication based analysis is less powerful than joint analysis (Skol et al., 2006). But it gives us an alternative 


way to examine findings without carrying out additional experiments (NCI-NHGRI Working Group on 


Replication in Association Studies, 2007 Kraft et ah, 2009). Thus, replication based analysis is a common 


method of choice when facing budget or study design constraints. 

In this paper, replication based analysis is our focus. For a reproduced association, suppose the type I 
error rates in the primary study and the replication study are ai and 02 , respectively, the probability of 
observing more extreme statistics is below aia 2 when the association doesn’t exist. Since this is a very 
low probability, the reproduced association has a very high confidence to be true positive. For associations 
irreproduced in the replication study, we may suspect that they are false alarms, but usually we cannot say 
much more about them. However, given information from the primary study, people like to know more: how 
probable is a primary positive association (i.e. positive association identified in the primary study) to be 
confirmed in the replication study? What’s the probability that a primary positive association is still a true 
positive even if it fails to show significance in the replication study? To answer these two questions, we need 
a systematic study on the behavior of primary positives in the replication study. Unfortunately, there is no 
such a study as yet. 

The aim of this paper is to systematically study primary positives in the replication study setting and to 
answer the two questions. Our contributions are listed as the following: 


1. Reproducibility rate (RR) is proposed to quantify the probability that a primary positive association 
will be confirmed in the replication study. RR can be used to guide the design of replication study. 

2. False irreproducibility rate (FIR) is proposed to quantify the probability that a primary positive 
association is still a true association even when it cannot be confirmed in the replication study. FIR 
can be used to discover potentially true associations in the irreproduced results. 

3. Estimation methods are proposed for RR and FIR when the summary statistics of the primary study 
are available. In other words, RR and FIR can be estimated even before the replication study is 
carried out. This nice property allows us to explore all possibilities in the design of replication study. 


The rest of this paper is organized as follows. In section we will give the mathematical definitions 
of RR and FIR first. We will also derive the relationship among local false discovery rate {fdr 


Efron 


2005), power, RR and FIR. Then we will estimate RR and FIR using the Bayesian framework with a 


two-component mixture prior. In section we will first show simulation results, which demonstrate that 
the estimation of RR and FIR works well when data agree with model assumptions. Then we will show 
the empirical results using the Type 2 Diabetes (T2D) data from DIAbetes Genetics Replication And Meta¬ 
analysis (DIAGRAM, Morris et al. 2012) and the Low Density Lipoprotein (LDL) cholesterol data from 
Global Lipids Genetics Gonsortium (GLGC, Global Lipids Genetics Gonsortium] 20I3). We will also show 
other potential applications of RR and FIR in the same section. In section]^ we will discuss limitations 
of our current modeling and estimation method. These provide guidance for the future work. Section 
concludes the paper. 


2 Method 

2.1 RR and FIR 

As illustration, we use logiOR) test to identify associations. Here log{OR) stands for logarithm of the odds 
ratio. The model can be easily generalized to quantitative trait with one-way fixed-effects ANOVA. Section 
gives an example of RR and FIR estimation for GWAS with quantitative trait. 

In the replication based analysis of GWAS, let’s assume study j {j = 1,2 denote primary study and 
replication study, respectively) has individuals, where of them are controls and nf'^ are cases. The 
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number of SNPs is m. We use ttq to denote the proportion of null SNPs, which have no association with the 
disease. 

For each SNP, we use A to represent the non-effect allele, and a to denote the effect allele. Tableshows 
a contingency table of alleles. Using the contingency table, we can estimate the logarithm of the odds ratio 

= log nifj - log nlj[^ - log + log . ( 2 . 1 ) 


The true effect size /i is normally unknow n. Using Wo olf’s method, we can approximate the asymptotic 
standard error of (denoted as as (Woolf 1955): 


(jU) 



( 2 . 2 ) 


The null and alternative hypotheses are 

"Ho : /r = 0, vs. "Hi : /r 7 ^ 0. (2.3) 

The corresponding test statistic is Let’s assume the significance levels in two studies are ai 

and 02 , respectively. 
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Table 1: Contingency table of one SNP in study j. Please see the main text for explanation of the notations. 


Since two-sided test is used in the primary study, a SNP showing association with the disease has the 
absolute value of its z-value larger than ^ 0 , 1 / 2 , he. > -^ 01 / 2 , where Zu is the upper u quantile of the 

standard normal distribution (0 < m < 0.5). For the positive SNP confirmed in the replication study, its 
z-value should be consistent with the z-value in the primary study. Thus, z^^'> should have the same sign as 
zA\ and should be also larger than Za^ in terms of absolute value, i.e. sgn{z^^'>)z^'^'> > Za^, where the sign 
function reads 

f 1 if a; > 0 

sgn{x) = < 0 if a; = 0 . (2.4) 

[ —1 if a; < 0 

The reason that the critical value is Za^ instead of Za 2/2 is that the sign of the rejection region is fixed, and 
the test can be regarded as one-sided. 

For a SNP revealing association in the primary study, RR is defined as 

RR = P{sgn{z‘-^'>)Z<-‘^'> > where \z^^^ \ > z„^/ 2 . (2.5) 

Correspondingly, FIR is defined as 

FIR = P{'Hi\sgn{z^^^)Z^^^ < Za 2 , where > Zaj 2 - (2.6) 

Bayes formula can be used to derive the relationship among RR, FIR, local false discovery rate of the 
primary study and power function of the replication study (Details are in the appendix): 


RR = fdr^^^a2 + (1 — 

(1 — fdr^^^){l — 77 ^^^) 


(2.7) 


FIR = 


1 - RR 


where fdr^^^ = P {'Ho\z^^^), and 77 ^^) = P{sgn{z^^^)Z^‘^^ > "Hi) = is the Bayesian 

predictive power ( Lecoutr^|2001 1 of the replication study, which averages the power among all possible effect 
size values given the test statistic in the primary study. 
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As indicated by Eq. (2.71, RR can be regarded as a weighted average between true null component 02 
and true associated component where and 1 — are the weights, respectively. FIR is the 

proportion of weighted true associated component (1 — fdr^^^){l — 77 *-^^) in the irreproducibility rate, namely 


1 — RR = fdA^\l — a2) + (1 — fdr^^^){l — 


( 2 . 8 ) 


Thus, the calculation of RR and FIR can be done once fdr^^^ and 77 ^^^ are known. 

Both fdr^^^ and 77 ^^^ are the posterior probabilities which depend on the distribution of underlying true 
effect size value 77 . We need to specify a prior distribution of fi for the calculation of fdA^'^ and In the 
following subsection, we will use a two-component mixture prior for fi to derive their calculation formulas. 


2.2 Two-component mixture prior 

In each study, the log{OR) estimator can be assumed normally distributed with a mean 77 and a standard 
deviation i.e. 


— 77 
crd) 


fV(0,l). 


(2.9) 


The true effect size 77 is unknown. Researches on heritability decomposition (Yang et al. 2010) and effect 


size distribution (Park et al. 2010) suggest that SNPs with small effect sizes occupy a larger proportion 
in the associated SNPs than those with large effect sizes. Hence, a natural prior for the effect size of the 
associated SNPs is a Gaussian prior with mean zero. Since we don’t know whether an arbitrary SNP is 
associated or not, the following two-component mixture prior is used for all SNPs: 


77 ~ ttqiJo + (1 - 7ro)Af(0, (To), 


( 2 . 10 ) 


where 5q is the distribution with point mass on zero whose probability density function (pdf) is Dirac function 
(5(a:). 

The local false discovery rate of the primary study can be calculated as: 




7ro(()(2:(^)) 


T:Q4>{z'd)) + 7ri())( 


z(l) 


yi + (rTo/a(i))2 


( 2 . 11 ) 


where 4>{x) is the pdf of the standard normal distribution. 

The Bayesian predictive power of the replication study can be calculated as follows (Details in the 
appendix): 


^( 2 ) ^ 

a* 


( 2 . 12 ) 


where z* = a* = y 1 -I- A j i A = and $(a;) is the cumulative density function (cdf) 

of the standard normal distribution. 

When summary statistics of the primary study are available, the asymptotic standard error of 77 ^^^ can 
be approximated by substituting observed allele frequencies from the primary study into Woolf’s method: 


.(2) 
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77 ^ 
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n[i ) 


(2.13) 


2.3 Estimation 

Clearly, RR, FIR, fdr^^'> and depend on hyperparameters tto and (Tq. Since all SNPs are assumed to 
share the same structure of distribution in terms of effect size in Eq. (2.10), the hyperparameters can be 
estimated with the test statistics of the primary study. 

The estimation of ttq has been addressed in the literature of FDR control from the Bayesian point of 
view (Storey and Tibshirani 2003). Suppose there is a “zero assumption” that all SNPs with p-value> 7 
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have almost no chance to be truly associated SNPs. Let’s denote the number of those SNPs as to+( 7 ). Then 
its expectation is 

E{m+{^)) = mP(p-value > 7 ,^ 0 ) = mP('Ho)P(p-value > 71 ^ 0 ) = TO7ro(l — 7 ), (2-14) 


which introduces an ttq estimator 


_ m+(7) 

° m(l — 7 ) 


(2.15) 


There is a tradeoff between bias and variance when choosing 7 in the estimation of TTg. [Storey and Tibshirani 
proposed a procedure without tuning the parameter 7 . The automated procedure will evaluate tto at different 
7 . Then, a natural cubic spline will fit to those evaluated values. The final tto will be obtained at 7 = 1 of 
the fitted spline. 

The estimator of (Tq reads (see appendix for detail): 


^2 _ 
0 — 


(1 - TTo) 





( 2 . 16 ) 


For each SNP showing association in the primary study, Bootstrap can be used to obtain the confidence 
interval of RR and FIR. 


3 Result 


3.1 Simulation experiments 

We use simulation experiments to check the following questions: 

1. Can RR and FIR be accurately estimated? 

2. How is the prediction performance of RRl 

(a) Can RR predict whether a primary association will be reproduced in the replication study? 

(b) Can RR describe the reproducibility probability well? 

3. Can FIR predict whether an irreproduced primary association is true positive or not? 

We simulate 2000 controls and 2000 cases in the primary study, and 1000 controls and 1000 cases in 
the replication study. The number of SNPs is 1 x 10^. The effect sizes of all SNPs are generated from the 
following two-component distribution: 


~ 0.9(5o + 0.1?V(0,0.04). 


(3.1) 


The minor allele frequencies are randomly simulated from a uniform distribution {7(0.05,0.5), and the preva¬ 
lence of the disease is set to 1%. We use ai = 5 x 10“^ and 0:2 = 5 x 10“^ as significance levels in the 
primary study and replication study, respectively. 

Figurej^shows the comparison between RR, FIR and their true values. The two scatter plots show that 
both RR and FIR work well in terms of estimation accuracy. This kind of experiment has been run 5 times. 
The root mean square error of these two estimators in Table show that RR and FIR have high estimation 
accuracy. 

In order to see whether RR can predict the replication status well, we use RR as a score to decide whether 
the association can be reproduced or not in the replication study. A Precision-Recall (PR) curve is drawn 
(Figure 2a) by using different thresholds in the prediction. The reason that we choose PR curve instead 
of the commonly used Receiver Operator Characteristic (ROC) curve is that the replication status in the 
positive findings can be highly imbalance d, and PR curve can give rn ore information about the prediction 
performance in the imbalanced situation (Davis and Goadrich 2006). A high RR value predicts that the 
association can be reproduced. The area under the PR curve is 0.924 in this simulation. This large area 
indicates that RR has good prediction performance^ an index about reproducibility. 

We use the following procedure to see whether RR can describe the reproducibility probability. 
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RR vs RR 


FIR vs FIR 




(a) RR (b) FIR 

Figure 1: RR and FIR can estimate RR and FIR accurately. The x-axis is the true values of RR (in (a)) 
or FIR (in (b)) in the simulation study, and the y-axis is the corresponding estimated values RR (in (a)) 
or FIR (in (b)). The solid line is y = x. 



RR 

FIR 

run 1 

0.004 

0.000 

run 2 

0.017 

0.002 

run 3 

0.000 

0.001 

run 4 

0.010 

0.001 

run 5 

0.013 

0.004 

Average 

0.009 

0.002 


Table 2: Root mean square error of RR and FIR in the simulation experiments. 


1. The associations are partitioned into groups according to RR. Each group has associations with 
approximately equal size. With 10 groups, the first group refers to 1/10 of the associations having the 
highest RR, the second group refers to the next 1/10 of the associations having the second decile of 
RR, and so on. 


2. The proportion of the reproduced associations in each group is defined as reproducibility proj^tion 
(RP). The mid-point of the range of RR is regarded as the RR in this group. RP and RR are 
compared in each group. 


Figure 2b shows the comparison between RP and RR for 10 groups. We can see that these two quantities 


agree well. The correlation between them is 0.987. This result implies RR can predict the reproducibility of 
findings. 

For the irreproduced findings, since we know whether it is true association or not in the simulation, a PR 
curve can be drawn for FIR (Figure]^. A high FIR value predicts the irreproduced association to be a true 
association. The area under the cure is 0.998 in this simulation, which indicates FIR has good prediction 
performance as an index about the potential being true association. 
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RR, AUPRC=0.924 


RP vs RR, p=0.987 




A 

Estimated Reproducibility Rate (RR) 


(a) PR curve of RR. 


(b) Reproducibility Proportion (RP) vs. RR. 


Figure 2: RR of an association can predict its reproducibility in the simulation study, (a) We use RR as 
a score to decide reproduced/irreproduced status in the replication study. A PR curve is drawn by using 
different thresholds. The x-axis is the recall in reproducibility prediction in terms of i?i?, and the y-axis 
is the corresponding precision. AUPRC is the area under precision-recall curve, (b) The associations are 
partitioned into 10 groups according to RR. The x-axis is the RR of the group, which is the mid-point of 
the range of RR within the group. The y-axis is the corresponding RP of the group, which is the proportion 
of the reproduced associations in each group. The solid line is y = x. 


3.2 T2D data from DIAGRAM 


Public T2D dataset from DIAGRAM is used to further check our RR estimation accuracy. For primary 
study, 56862 individuals are in the control group and 12171 individuals are in the case group. For replication 
study, the sample size in the control group is 55647, and the sample size in the case group is 21491. After 
filtering out SNPs with p-value< 0.01 in the test of homogeneity, there are m = 89659 SNPs remaining. The 
significance level used in the primary study is genome-wide significance level 5 x 10“®. The significance level 
used in the replication study is 5 x 10“®. The estimated proportion of null hypotheses is tto = 0.924, and the 
estimated effect size variance is CTq = 5.43 x 10“^. There are 177 SNPs showing signihcant associations in the 
primary study, from which 24 clumps are formed. Each clump contains SNPs in a nearby region (< 250kb) 
with strong linkage disequilibrium between them (r^ > 0.5). The SNP showing strongest association is 
selected to estimate RR and FIR in each clump. The results from all clumps are shown in the a ppe ndix. 

A precision-recall curve is drawn for the prediction of reproducibility based on RR (Figure 4a). The 


area under the precision-recall curve is 0.991. In comparison, if p-value is regarded as an index describing 
reproducibility, then the area under the curve is 0.949, smaller than the area of RR. In order to see whether 
RR can describe the reproducibility probability for an association, we calculate RP by partitioning the 
clumps into 5 groups according to their RR values, and make a comparison between RP and RR. We use 
5 groups here instead of 10 groups used in the simulation study because the number of the clumps is much 
smaller than the number of the associations in the simulation experiments. Figure [4b| shows the comparison 
between RP and RR. These two quantities agree well. The correlation between them is 0.983. This result 
illustrates that RR can well represent the probability of being reproduced for each of the findings in these 
data. 

There are also 5 irreproduced clumps in the result. While we may suspect that they are false positives if we 
follow the traditional strategy, their FIR values indicate that they are all very likely to be true associations. 
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FIR AUPRC=0.996 



Figure 3: Precision-recall curve of FIR in the simulation study. FIR of an irreproduced finding can be a 
quantitative index to describe the potential that this finding is a true association. The x-axis is the recall in 
false irreproducibility prediction in terms of FIR, and the y-axis is the corresponding precision. AUPRC is 
the area under precision-recall curve. 


To verify if this statement is true, we have used meta-analysis method to increase the power of the study. 
The corresponding p-values of these clumps are indeed smaller than the genome-wide significance level. 


3.3 LDL Cholesterol data from GLGC 

We also conducted experiments in the published LDL cholesterol data from GLGC. The phenotype value 
measured in the study is quantitative. The estimated and standardized regression coefficients are used as 
test statistics. For primary study, there are about 93982 individuals. For replication study, the sample size is 
around 94565. After filtering out SNPs with p-value< 0.01 in the test of homogeneity, there are m = 81942 
SNPs. The significance level in the primary study is 5 x 10“®, and the significance level in the replication 
study is 5 X 10“®. The estimated proportion of null hypotheses is = 0.905, and the estimated effect size 
variance is CTq = 6.20 x 10“^. There are 748 SNPs showing significant associations in the primary study, 
forming 161 clumps. The SNP showing strongest association is selected for estimating RR and FIR in each 
clump. The estimated results for all clumps are shown in the appendix. 


A precision-recall curve is drawn for the prediction of reproduced status based on RR (Figure 5aI. The 
area under the precision-recall curve is 0.968. In comparison, if p-value is regarded as an index describing 
reproducibility, then the area under the curve is 0.919, smaller than the area of RR. To see whether RR can 
describe the reproducibilityju'obabilit y fo r an association, we calculate RP by partitionmg the clumps into 5 


groups according to their RR. Figure 5b shows the good agreement between RP and RR in the partitioned 


5 groups of the clumps. The correlation coefficient between them is 0.97. 

There are 29 irreproduced clumps in the result. Their FIR values are all larger than 0.99, indicating that 
they have high possibility to be true associations. We carried out meta-analysis again. The corresponding 
p-values are all smaller than 5 x 10“®. 


3.4 Other potential applications of RR 

We can use RR to determine the sample size needed in the replication study to achieve an expected repro¬ 
ducibility probability for the primary associations. Also, we can use RR to check the consistency between 
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- RR, AUPRC=0.991 

p-value, AUPRC=0.949 
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(a) Precision-recall curve. 


(b) Reproducibility Proportion (RP) vs. RR. 


Figure 4: Reproducibility prediction in T2D data from DIAGRAM, (a) The x-axis is the recall in repro¬ 
ducibility prediction in terms of i?i?, and the y-axis is the corresponding precision. AUPRC is the area 
under precision-recall curve. Both PR curve based on RR (solid line) and PR curve based on p-value (dashed 
line) are drawn in the figure. According to their AUPRC values, RR predicts reproducibility better than 
p-value. (b) The associations are partitioned into 5 groups according to RR. The x-axis is the RR of the 
group, which is the mid-poin of RR values. The y-axis is the corresponding RP of the group, which is the 
proportion of the reproduced associations in each group. The solid line is y = a;. 


the primary study and the replication study when both of them are conducted. We will describe these two 
potential applications in this subsection. 

The following three methods can be used to determine the sample size of a replication study: 


1. Traditional sample size determination method is based on power calculation. A minimum effect size 
to be detected (/imm) is specified beforehand. Then, the sample size is determined such that the 
calculated statistical power is larger than a threshold, e.g. (^min) > 80%. This traditional power 
analysis method treats replication study as another independent primary study. The connection be¬ 
tween primary study and replication study is not utilized in the design. The fj-rnin may be arbitrary. 
Also bias may occur in the specification of Umin- These make the determined sample size subjective. 


2. Taking advantage of the connection between primary study and replication study, the sa mple size of th e 
replication study can be determined based on calculating Bayesian predictive power (Wang 20071. 
But if we consider the major question of replication study, which is how likely the primary positive 
finding will be replicated, does not directly address this question. For example, = 80% doesn’t 
mean that a primary positive finding has a 80% of chance to be replicated. 


3. RR is a comprehensive measure directly addressing the question of replication. For primary study’s 
results, RR is related to in an one-to-one mapping with a better interpretability of “replication”. 
If sample size of the replication study is specified with RR = 80%, then a primary association has a 
possibility of 80% to be replicated. 


According to above analysis, it is more natural to design a replication study based on RR. A very nice 
property of RR is that we do not really need to carry out the replication study when estimating RR. This 
provides a huge advantage to explore all possibilities when designing the replication study. 















PRC 


RP vs RR, p=0.97 




(a) Precision-recall curve. (b) Reproducibility Proportion (RP) vs. RR. 

Figure 5: Reproducibility prediction in LDL Cholesterol data from GLGC. (a) The x-axis is the recall in 
reproducibility prediction in terms of RR, and the y-axis is the corresponding precision. AUPRC is the 
area under precision-recall curve. Both PR curve based on RR (solid line) and PR curve based on p-value 
(dashed line) are drawn in the figure. According to their AUPRC values, RR predicts reproducibility better 
than p-value. (b) The associations are partitioned into 5 groups according to RR. The x-axis is the RR of 
the group, which is the mid-point of the range of RR. The y-axis is the corresponding RP of the group, 
which is the proportion of the reproduced associations in each group. The solid line is y = a;. 


Another application of RR is quality check. In normal scenarios, the results of replication study are 
consistent with RR values. If inconsistency occurs, we should be alarmed. The potential sources of incon¬ 
sistency should be analyzed. These sources may be attributed to factors influencing either primary study’s 
or replication study’s results, such as bias and measurement errors (loannidis 2006). 


4 Discussion 


Please note that if > 0, which is usually the case for a primary positive association, RR has an upper 


limit which is smaller than 1. According to Eq. (2.7), 


RR = fdr^^^a2 + (1 — 

< 1 - fdr^'^'>{l - a2), 


(4.1) 


where equality is achieved if = 1. This indicates that the influence of null distribution (namely 02 ) 
never disappears for a primary positive association with fdr^^^ > 0. The Bayesian predictive power 77 ^^^ can 
be increased by increasing the sample size of the replication study. In the situation of fdA^^ > 0, no matter 
how many individuals are participated in the replication study, the primary association will not have 100 % 
probability of being reproduced. 

Also, since unbiased testing method is used in the replication study, i.e. a 2 < we have RR < 
according to Eq. (2.7). This indicates that, for a primary association in a designed replication study, the 


probability of being reproduced is smaller than its Bayesian predictive power. 

At a first glance, people may regard p-value as a quantitative index to describe the reproducibility. An 
association with a lower p-value has a higher possibility to be reproduced than an association with a higher 
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p-value. The argument is that the p-values of associations have the same ordering as the local false discovery 
rates, which are the probabilities of the corresponding hypotheses being null given their test statistics. But a 
low probability of being null does not mean a high probability to be reproduced. Hence, unlike RR, p-value 
is not an index to describe the reproducibility directly. 

The accuracy of RR and FIR estimation relies on the accuracy of tto. Although we apply the method of 
Storey and Tibshirani (2003) to estimate ttq, there exist other options. For example, when the “zero assump¬ 


tion” is violated in data or the true null distribution of test statistics does not agree with the theoretical 


distribution (Efron 2004), it may be better to use the methods proposed by Langaas et al. (2005) or Jin 


and Cai (2007) for a reliable estimation of ttq. 


Our model of RR and FIR is limited with independent assumption for each SNP. In reality, the correlation 
between SNPs, such as linkage disequilibrium, are common. An adjusted model for RR and FIR considering 
correlation is needed in the future. 


5 Conclusion 

In replication based analysis, positive associations identified in the primary study need to be verified in the 
replication study. In this paper, we presented a Bayesian framework to systematically study the behavior of 
those primary findings in the replication study, and proposed two new probabilistic measures, reproducibility 
rate (RR) and false irreproducibility rate (FIR), to quantify the behavior. 

RR is proposed to quantify the reproducibility probability for every finding. An estimation method is 
provided for RR based on the summary statistics of the primary study. Experiments using simulation and 
real data show our estimation methods can predict the reproducibility well. Thus, RR can be used to guide 
the experiments design of the replication study, and can also be used to check whether there are other factors 
affecting either primary study’s or replication study’s results, such as bias and measurement errors. 

FIR is proposed to quantify the probability that a primary association is still a true positive even when 
it is not reproduced in the replication study. The irreproduced associations with high FIRs also have a high 
confidence to be truly associated ones. People can use FIR estimation to prevent some irreproduced results 
from being discarded. 
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Appendix 

A Detailed deduction of RR and FIR 


The relationship between RR, and can be derived from the law of total probability: 

RR = P{sgn{z^^^)Z^‘^^ > 

= Pisgniz^Fz^^^ > ^c.,,no\z<-^^) + Pisgniz^Fz^^^ > Zo.,,ni\z<-F 

= P{Ho\z^^'>)P{sgniz^^'>)Z^‘^'> > Zc,,\Ho, z^^'>) + P{Hi\z^^^)P{sgn{z'^^'>)Z(^^'> > Za.,\ni, z^^'>) 

= fdFF2 + {1 — (A.l) 


where 


^( 2 ) 


P(sgn(z(i))Z( 2 ) > 

pOO 

/ P(sgn(z(i))Z( 2 ) > 


f —OO 
/‘OO 


Za2^d\F,F'>)dg 

Za2 1'Hi, z^^'>)p{fi\'Hi, z^^^)dg. 


— OO 
OO 


/3(2)(^)p(^|Hi,zW)d/r 


E{F^'>{p)\z<^^\Hi). 


(A.2) 


The relationship between FIR, fdF^ and can be derived using the Bayes formula: 

FIR = P{ni\sgn{z''Fz''‘^'' <Za2,z^F 

P{ni\z^FP{sgn{z^FZ^'^'^ < Zc, 1 ^ 1 , 

P{sgn{zP'>)Z^‘^'> < Za2\zP'>) 

(1 — fdA^^)(l — 77^^)) 


B Derivation of fdA^\ 77 *^^^ under a two-component mixture prior 

The following property for multivariate Gaussian distribution can be used to calculate fdF^ and 
Property 1. // Z|/r ^ Vp(/r, S), and g, ~ Np{go, Sq), then 


Z - Vp(/ro, S + So) and g\'z ~ Np{Wgo + (I - W)z, (I - W)S) 
with W = S(So + S)-i 

The proof of Property can be found in Bishop 2006, Chapter 2. 

By using Property the distribution of the test statistic Z^^'> is: 

~ 7rofV(0,1) + (1 - 7ro)V(0,1 + 

Hence the local false discovery rate of the primary study can be calculated with following: 

,( 1 ) _ _ TTo(j){z^^'>) 


fdP'^> = 


7ro^(z(i)) + (1 - 7ro)(A(-^j^ 


:(!) 


0 'o/ct( 1))2 


where (p{x) is the pdf of the standard normal distribution. 

Since {g\Ri) ~ V(0,(To)i can obtain 

V(AA^i),A(aW) 2 ), 


(B.l) 


(B.2) 


(B.3) 


(B.4) 
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where A = plays a shrinkage effect. The posterior distribution of under "Hi reads 

= = 1 + a(^^) y (B.5) 

The Bayesian predictive power of the replication study can be calculated as follows: 

m ^ (B, 6 ) 

a* 

where ^{x) is the cdf of the standard normal distribution. 
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C Derivation of the al estimator 


From (B.2), the distribution of is a two-component Gaussian mixture model. So we have 

- TToX? + (1 - TTo) (l -f xl, 

where xf is the x^ distribution with degree of freedom (df) 1. The expectation reads 

f;((z«)2) = ^o + (i-^o)(i + (^)"). 

For all SNPs, the following can be obtained 

m m 

+ (1 - 7ro)(m -k crl 

By substituting can get the estimator for 


an = 


-^'^0 
(1 - TTo) 


-m] /Xl(Vc^fV- 


(C.l) 


(C.2) 


(C.3) 


(C.4) 


2=1 
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D RR and FIR in T2D data from DIAGRAM 


Table 3: RR and FIR results for T2D data in DIAGRAM. Column PI is the p-value in the primary study; 
Column P2 is the p-value in the replication study; Column Pmeta is the p-value in the meta-analysis. Column 
RR 95%CI is the 95% confidence interval for RR. Column FIR 95%CI is the 95% confidence interval for 
FIR. 



SNP 

PI 

P2 

Pmeta 

RR 

RR 95%CI 

EIR 

EIR 95%CI 

1 

rsl801282 

7.401E-10 

4.707E-05 

1.013E-12 

0.779 

(0.769 , 

, 0.788) 

1.000 

(1.000 

1 .000) 

2 

rsll711477 

1.339E-11 

6.166E-08 

2.151E-17 

0.951 

(0.949 , 

, 0.953) 

1.000 

(1.000 

1 .000) 

3 

rsl801214 

5.073E-10 

2.887E-07 

3.149E-15 

0.913 

(0.910 , 

, 0.916) 

1.000 

(1.000 

1 .000) 

4 

rs9348440 

2.859E-12 

1.369E-06 

1.638E-16 

0.923 

(0.918 , 

, 0.928) 

1.000 

(1.000 

1 .000) 

5 

rs4710940 

3.553E-15 

9.980E-09 

2.212E-21 

0.991 

(0.991 , 

, 0.992) 

1.000 

(1.000 

1 .000) 

6 

rs6931514 

O.OOOE-hOO 

1.342E-13 

5.207E-32 

1.000 

(1.000 , 

, 1.000) 

1.000 

(1.000 

1 .000) 

7 

rs9465871 

O.OOOE-hOO 

4.163E-11 

6.427E-26 

1.000 

(1.000 , 

, 1.000) 

1.000 

(1.000 

1 .000) 

8 

rs7741604 

4.868E-09 

1.750E-03 

3.803E-10 

0.680 

(0.670 , 

, 0.689) 

1.000 

(1.000 

1 .000) 

9 

rs864745 

1.506E-12 

3.423E-06 

2.165E-16 

0.938 

(0.936 , 

, 0.940) 

1.000 

(1.000 

1 .000) 

10 

rs498475 

1.398E-08 

6.537E-07 

1.568E-13 

0.866 

(0.862 , 

, 0.870) 

1.000 

(1.000 

1 .000) 

11 

rsll774700 

5.053E-11 

1.251E-07 

1.414E-15 

0.993 

(0.992 , 

, 0.993) 

1.000 

(1.000 

1 .000) 

12 

rsl0811661 

1.976E-14 

1.210E-14 

3.450E-27 

0.953 

(0.950 , 

, 0.956) 

1.000 

(1.000 

1 .000) 

13 

rs2798253 

3.249E-12 

2.282E-04 

5.610E-14 

0.863 

(0.858 , 

, 0.869) 

1.000 

(1.000 

1 .000) 

14 

rs2421943 

5.073E-10 

8.855E-01 

6.072E-09 

0.001 

(0.001 , 

, 0.001) 

1.000 

(1.000 

1 .000) 

15 

rs7911264 

1.339E-11 

4.413E-07 

5.623E-16 

0.984 

(0.983 , 

, 0.985) 

1.000 

(1.000 

1 .000) 

16 

rs7923866 

2.551E-13 

5.355E-08 

1.126E-18 

0.990 

(0.989 , 

, 0.990) 

1.000 

(1.000 

1 .000) 

17 

rs7917983 

O.OOOE-hOO 

1.442E-08 

2.478E-27 

0.998 

(0.998 , 

, 0.999) 

1.000 

(1.000 

1 .000) 

18 

rsl0128255 

2.854E-08 

5.551E-16 

2.103E-22 

0.923 

(0.919 , 

, 0.926) 

1.000 

(1.000 

1 .000) 

19 

rsl2266632 

9.452E-11 

1.256E-10 

2.241E-19 

0.876 

(0.861 , 

, 0.891) 

1.000 

(1.000 

1 .000) 

20 

rsl0787472 

O.OOOE-hOO 

O.OOOE-hOO 

1.336E-58 

1.000 

(1.000 , 

, 1.000) 

1.000 

(1.000 

1 .000) 

21 

rsl2255372 

O.OOOE-hOO 

O.OOOE-hOO 

2.537E-112 

1.000 

(1.000 , 

, 1.000) 

1.000 

(1.000 

1 .000) 

22 

rslll96212 

7.313E-11 

1.306E-12 

1.326E-21 

0.880 

(0.877 , 

, 0.883) 

1.000 

(1.000 

1 .000) 

23 

rsl0765573 

5.073E-10 

6.081E-04 

3.363E-11 

0.888 

(0.884 , 

, 0.892) 

1.000 

(1.000 

1 .000) 

24 

rsl2149832 

1.339E-11 

3.622E-12 

6.775E-22 

0.967 

(0.966 , 

, 0.969) 

1.000 

(1.000 

1 .000) 


E RR and FIR in LDL Cholesterol data from GLGC 


Table 4: RR and FIR results for LDL Cholesterol data in GLGC. Column PI is the p-value in the primary 
study; Column P2 is the p-value in the replication study; Column Pmeta is the p-value in the met a-analysis. 
Column RR 95%CI is the 95% confidence interval for RR. Column FIR 95%CI is the 95% confidence 
interval for FIR. When RR = 1, FIR cannot be obtained. 



SNP 

PI 

P2 

Pmeta 

RR 

RR 95%CI 

EIR 

EIR 95%CI 

1 

rs2304130 

O.OOOE-hOO 

3.236E-12 

1.004E-34 

0.999 

(0.999 

, 1.000) 

1.000 

(1.000 

1 .000) 

2 

rs7832643 

7.963E-09 

2.739E-12 

4.619E-19 

0.709 

(0.705 

, 0.714) 

1.000 

(1.000 

1 .000) 

3 

rsl0808546 

O.OOOE-hOO 

O.OOOE+00 

1.475E-47 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 

1 .000) 

4 

rs7515901 

4.420E-09 

1.166E-14 

5.260E-21 

0.557 

(0.549 

, 0.566) 

1.000 

(1.000 

1 .000) 

5 

rsl0069744 

3.650E-10 

3.165E-05 

3.143E-13 

0.779 

(0.769 

, 0.789) 

1.000 

(1.000 

1 .000) 

6 

rsl3344893 

1.510E-14 

O.OOOE-hOO 

5.083E-30 

0.953 

(0.951 

, 0.956) 

1.000 

(1.000 

1 .000) 

7 

rs6725189 

1.332E-15 

O.OOOE-hOO 

4.481E-41 

0.984 

(0.983 

, 0.985) 

1.000 

(1.000 

1 .000) 

8 

rsl6996148 

O.OOOE-HOO 

O.OOOE-hOO 

5.057E-49 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 

1 .000) 

9 

rs7251031 

7.840E-13 

3.252E-12 

3.653E-23 

0.869 

(0.865 

, 0.874) 

1.000 

(1.000 

1 .000) 

10 

rsl2286037 

1.658E-11 

1.130E-10 

2.245E-20 

0.781 

(0.766 

, 0.797) 

1.000 

(1.000 

1 .000) 


16 



11 

rs631106 

O.OOOE+00 

O.OOOE+00 

3.108E-34 

0.993 

(0.993 

, 0.993) 

1.000 

(1.000 , 

1 .000) 

12 

rs3791981 

O.OOOE+00 

O.OOOE+00 

1.263E-44 

0.999 

(0.999 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

13 

rs2385114 

5.646E-10 

4.558E-12 

3.413E-20 

0.853 

(0.849 

, 0.857) 

1.000 

(1.000 , 

1 .000) 

14 

rs3810444 

1.024E-12 

1.049E-03 

1.098E-13 

0.690 

(0.671 

, 0.710) 

1.000 

(1.000 , 

1 .000) 

15 

rsl0422616 

5.646E-09 

4.712E-07 

2.843E-14 

0.722 

(0.718 

, 0.727) 

1.000 

(1.000 , 

1 .000) 

16 

rs4518686 

9.917E-09 

O.OOOE+00 

1.700E-25 

0.802 

(0.798 

, 0.807) 

1.000 

(1.000 , 

1 .000) 

17 

rsl0062361 

O.OOOE+00 

O.OOOE+00 

9.229E-59 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

18 

rsl2708967 

4.138E-08 

1.777E-05 

6.755E-12 

0.435 

(0.428 

, 0.443) 

1.000 

(1.000 , 

1 .000) 

19 

rs2479409 

O.OOOE+00 

O.OOOE+00 

2.905E-55 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

20 

rs413582 

4.467E-12 

O.OOOE+00 

2.488E-32 

0.916 

(0.914 

, 0.919) 

1.000 

(1.000 , 

1 .000) 

21 

rs2073547 

1.843E-14 

1.777E-10 

4.249E-23 

0.826 

(0.821 

, 0.832) 

1.000 

(1.000 , 

1 .000) 

22 

rs655246 

O.OOOE+00 

O.OOOE+00 

2.545E-46 

0.999 

(0.999 

, 0.999) 

1.000 

(1.000 , 

1 .000) 

23 

rsl0198175 

3.553E-15 

O.OOOE+00 

3.494E-34 

0.944 

(0.940 

, 0.949) 

1.000 

(1.000 , 

1 .000) 

24 

rs2954038 

8.882E-16 

3.191E-12 

4.043E-26 

0.961 

(0.960 

, 0.963) 

1.000 

(1.000 , 

1 .000) 

25 

rs2737252 

1.070E-08 

1.677E-07 

1.881E-14 

0.728 

(0.722 

, 0.733) 

1.000 

(1.000 , 

1 .000) 

26 

rs4703646 

O.OOOE+00 

O.OOOE+00 

5.373E-35 

0.997 

(0.997 

, 0.997) 

1.000 

(1.000 , 

1 .000) 

27 

rs2642438 

6.851E-10 

6.737E-09 

5.250E-17 

0.834 

(0.830 

, 0.839) 

1.000 

(1.000 , 

1 .000) 

28 

rs9302635 

2.070E-08 

1.694E-09 

4.569E-16 

0.592 

(0.585 

, 0.600) 

1.000 

(1.000 , 

1 .000) 

29 

rsl7584208 

O.OOOE+00 

O.OOOE+00 

1.513E-51 

0.996 

(0.995 

, 0.997) 

1.000 

(1.000 , 

1 .000) 

30 

rs4803750 

O.OOOE+00 

O.OOOE+00 

2.688E-173 

1.000 

(1.000 

, 1.000) 




31 

rs688 

O.OOOE+00 

O.OOOE+00 

3.040E-48 

0.998 

(0.997 

, 0.998) 

1.000 

(1.000 , 

1 .000) 

32 

rsl2410656 

2.197E-08 

3.282E-03 

5.072E-10 

0.052 

(0.050 

, 0.055) 

1.000 

(1.000 , 

1 .000) 

33 

rs8044335 

4.509E-10 

2.607E-08 

1.337E-16 

0.822 

(0.818 

, 0.826) 

1.000 

(1.000 , 

1 .000) 

34 

rs5744680 

O.OOOE+00 

O.OOOE+00 

7.507E-68 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

35 

rs7523242 

O.OOOE+00 

O.OOOE+00 

2.067E-45 

0.999 

(0.998 

, 0.999) 

1.000 

(1.000 , 

1 .000) 

36 

rs6129778 

6.113E-10 

1.065E-08 

7.860E-17 

0.850 

(0.845 

, 0.855) 

1.000 

(1.000 , 

1 .000) 

37 

rsl0832962 

5.156E-10 

1.804E-07 

1.244E-15 

0.841 

(0.837 

, 0.845) 

1.000 

(1.000 , 

1 .000) 

38 

rsl0947332 

3.798E-08 

1.985E-13 

2.257E-19 

0.626 

(0.616 

, 0.637) 

1.000 

(1.000 , 

1 .000) 

39 

rs622342 

7.061E-09 

1.530E-08 

1.244E-15 

0.842 

(0.838 

, 0.846) 

1.000 

(1.000 , 

1 .000) 

40 

rs2288912 

3.600E-08 

7.003E-03 

1.078E-08 

0.626 

(0.621 

, 0.631) 

1.000 

(1.000 , 

1 .000) 

41 

rs3786721 

1.754E-14 

O.OOOE+00 

7.448E-35 

0.949 

(0.947 

, 0.950) 

1.000 

(1.000 , 

1 .000) 

42 

rs693 

O.OOOE+00 

O.OOOE+00 

4.357E-139 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

43 

rs505151 

3.798E-08 

7.804E-12 

4.718E-18 

0.431 

(0.405 

, 0.459) 

1.000 

(1.000 , 

1 .000) 

44 

rsll881156 

O.OOOE+00 

O.OOOE+00 

1.574E-61 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

45 

rsl74570 

9.241E-11 

2.842E-14 

4.965E-23 

0.845 

(0.839 

, 0.851) 

1.000 

(1.000 , 

1 .000) 

46 

rs6756629 

O.OOOE+00 

3.142E-14 

3.410E-50 

0.920 

(0.910 

, 0.929) 

1.000 

(1.000 , 

1 .000) 

47 

rs4962153 

3.843E-10 

1.033E-05 

3.893E-14 

0.237 

(0.230 

, 0.244) 

1.000 

(1.000 , 

1 .000) 

48 

rsl0402271 

O.OOOE+00 

O.OOOE+00 

1.945E-131 

1.000 

(1.000 

, 1.000) 




49 

rsl0903129 

2.399E-10 

2.745E-10 

7.660E-19 

0.838 

(0.835 

, 0.842) 

1.000 

(1.000 , 

1 .000) 

50 

rs7205804 

O.OOOE+00 

1.640E-09 

2.174E-27 

0.996 

(0.996 

, 0.996) 

1.000 

(1.000 , 

1 .000) 

51 

rs2738464 

2.603E-08 

1.529E-05 

4.580E-12 

0.561 

(0.550 

, 0.574) 

1.000 

(1.000 , 

1 .000) 

52 

rsl000237 

4.936E-12 

2.317E-04 

1.261E-13 

0.917 

(0.915 

, 0.920) 

1.000 

(1.000 , 

1 .000) 

53 

rs6662286 

O.OOOE+00 

9.770E-15 

8.148E-42 

0.982 

(0.979 

, 0.984) 

1.000 

(1.000 , 

1 .000) 

54 

rs4360309 

1.549E-09 

8.146E-13 

3.245E-20 

0.719 

(0.715 

, 0.724) 

1.000 

(1.000 , 

1 .000) 

55 

rs6065311 

2.442E-15 

O.OOOE+00 

5.006E-31 

0.989 

(0.989 

, 0.990) 

1.000 

(1.000 , 

1 .000) 

56 

rs3786722 

O.OOOE+00 

O.OOOE+00 

7.761E-69 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

57 

rsl1668536 

3.553E-15 

O.OOOE+00 

4.181E-30 

0.948 

(0.946 

, 0.950) 

1.000 

(1.000 , 

1 .000) 

58 

rsl7135399 

4.997E-12 

7.671E-12 

7.810E-22 

0.573 

(0.554 

, 0.593) 

1.000 

(1.000 , 

1 .000) 

59 

rs7552841 

3.105E-11 

1.593E-07 

6.081E-17 

0.450 

(0.445 

, 0.456) 

1.000 

(1.000 , 

1 .000) 

60 

rs7030248 

8.480E-12 

5.626E-05 

2.155E-14 

0.882 

(0.879 

, 0.885) 

1.000 

(1.000 , 

1 .000) 

61 

rs6589566 

6.037E-12 

2.220E-16 

2.413E-26 

0.843 

(0.831 

, 0.855) 

1.000 

(1.000 , 

1 .000) 

62 

rs752434 

5.536E-09 

2.403E-11 

2.258E-18 

0.703 

(0.698 

, 0.708) 

1.000 

(1.000 , 

1 .000) 

63 

rsl0401969 

O.OOOE+00 

O.OOOE+00 

9.192E-61 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

64 

rs4341893 

O.OOOE+00 

O.OOOE+00 

1.594E-56 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 
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65 

rs540796 

O.OOOE+00 

1.793E-13 

7.633E-37 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

66 

rs4426495 

O.OOOE+00 

2.109E-15 

6.043E-32 

0.997 

(0.996 

, 0.997) 

1.000 

(1.000 , 

1 .000) 

67 

rs568938 

O.OOOE+00 

O.OOOE+00 

7.217E-155 

1.000 

(1.000 

, 1.000) 




68 

rs6739502 

1.973E-09 

O.OOOE+00 

8.875E-25 

0.834 

(0.831 

, 0.838) 

1.000 

(1.000 

1 .000) 

69 

rs2244608 

2.477E-09 

4.140E-13 

2.114E-20 

0.776 

(0.771 

, 0.780) 

1.000 

(1.000 

1 .000) 

70 

rsl7424122 

5.068E-10 

3.203E-04 

1.362E-12 

0.132 

(0.125 

, 0.141) 

1.000 

(1.000 

1 .000) 

71 

rs769450 

3.135E-13 

3.649E-01 

5.366E-13 

0.001 

(0.001 

, 0.002) 

1.000 

(1.000 

1 .000) 

72 

rs4299376 

O.OOOE+00 

O.OOOE+00 

8.724E-73 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 

1 .000) 

73 

rsl0403668 

2.380E-11 

8.471E-08 

3.111E-17 

0.863 

(0.857 

, 0.869) 

1.000 

(1.000 

1 .000) 

74 

rsl74448 

3.528E-11 

5.276E-05 

1.668E-14 

0.245 

(0.241 

, 0.249) 

1.000 

(1.000 

1 .000) 

75 

rs7701925 

O.OOOE+00 

6.267E-04 

6.506E-27 

0.113 

(0.111 

, 0.116) 

1.000 

(1.000 

1 .000) 

76 

rs6722374 

6.555E-10 

2.585E-03 

1.500E-10 

0.812 

(0.808 

, 0.815) 

1.000 

(1.000 

1 .000) 

77 

rsl7398765 

6.373E-14 

O.OOOE+00 

1.879E-33 

0.848 

(0.835 

, 0.860) 

1.000 

(1.000 

1 .000) 

78 

rsl6973520 

2.931E-11 

1.825E-11 

6.538E-21 

0.922 

(0.919 

, 0.925) 

1.000 

(1.000 

1 .000) 

79 

rsl74601 

O.OOOE+00 

O.OOOE+00 

3.385E-37 

0.992 

(0.992 

, 0.993) 

1.000 

(1.000 

1 .000) 

80 

rs3106167 

4.516E-12 

1.013E-05 

2.665E-15 

0.919 

(0.914 

, 0.924) 

1.000 

(1.000 

1 .000) 

81 

rsl2710745 

1.530E-13 

O.OOOE+00 

1.479E-29 

0.947 

(0.945 

, 0.949) 

1.000 

(1.000 

1 .000) 

82 

rs4722551 

1.797E-09 

7.813E-08 

1.468E-15 

0.689 

(0.682 

, 0.697) 

1.000 

(1.000 

1 .000) 

83 

rsl0515198 

6.943E-12 

1.015E-12 

9.264E-23 

0.872 

(0.865 

, 0.880) 

1.000 

(1.000 

1 .000) 

84 

rs4808199 

O.OOOE+00 

1.193E-06 

8.426E-21 

0.991 

(0.991 

, 0.992) 

1.000 

(1.000 

1 .000) 

85 

rs4507059 

3.987E-10 

1.301E-11 

7.174E-20 

0.813 

(0.809 

, 0.818) 

1.000 

(1.000 

1 .000) 

86 

rs3903032 

5.730E-10 

1.665E-15 

2.501E-23 

0.786 

(0.776 

, 0.796) 

1.000 

(1.000 

1 .000) 

87 

rsl3465 

O.OOOE+00 

l.llOE-16 

8.517E-34 

0.905 

(0.894 

, 0.916) 

1.000 

(1.000 

1 .000) 

88 

rsl2608822 

1.220E-11 

1.328E-05 

1.914E-15 

0.401 

(0.381 

, 0.423) 

1.000 

(1.000 

1 .000) 

89 

rs912540 

1.191E-11 

3.157E-12 

4.812E-22 

0.897 

(0.893 

, 0.902) 

1.000 

(1.000 

1 .000) 

90 

rsll68114 

O.OOOE+00 

O.OOOE+00 

3.326E-36 

0.998 

(0.998 

, 0.998) 

1.000 

(1.000 

1 .000) 

91 

rsl7150482 

3.100E-08 

4.200E-05 

1.677E-11 

0.640 

(0.635 

, 0.646) 

1.000 

(1.000 

1 .000) 

92 

rs6511720 

O.OOOE+00 

O.OOOE+00 

3.785E-287 

1.000 

(1.000 

, 1.000) 




93 

rs934197 

O.OOOE+00 

1.667E-03 

4.659E-84 

0.195 

(0.191 

, 0.199) 

1.000 

(1.000 , 

1 .000) 

94 

rsl0198972 

2.705E-12 

5.551E-16 

4.175E-26 

0.692 

(0.668 

, 0.718) 

1.000 

(1.000 , 

1 .000) 

95 

rs9989419 

6.937E-09 

1.368E-05 

1.225E-12 

0.694 

(0.689 

, 0.699) 

1.000 

(1.000 , 

1 .000) 

96 

rsl0893500 

1.815E-09 

2.209E-14 

8.298E-22 

0.801 

(0.794 

, 0.809) 

1.000 

(1.000 , 

1 .000) 

97 

rs635634 

O.OOOE+00 

1.488E-14 

9.338E-45 

0.993 

(0.992 

, 0.993) 

1.000 

(1.000 , 

1 .000) 

98 

rs6711016 

O.OOOE+00 

O.OOOE+00 

1.370E-37 

0.991 

(0.990 

, 0.991) 

1.000 

(1.000 , 

1 .000) 

99 

rsl2448528 

8.317E-09 

1.461E-05 

1.116E-12 

0.276 

(0.271 

, 0.282) 

1.000 

(1.000 , 

1 .000) 

100 

rs4077440 

7.394E-14 

1.079E-11 

1.160E-23 

0.972 

(0.971 

, 0.973) 

1.000 

(1.000 , 

1 .000) 

101 

rs2972564 

1.599E-11 

6.825E-13 

2.846E-22 

0.655 

(0.642 

, 0.669) 

1.000 

(1.000 , 

1 .000) 

102 

rsl0888897 

O.OOOE+00 

1.407E-10 

1.495E-33 

0.995 

(0.995 

, 0.995) 

1.000 

(1.000 , 

1 .000) 

103 

rs8108762 

2.321E-08 

5.551E-16 

7.272E-22 

0.737 

(0.732 

, 0.742) 

1.000 

(1.000 , 

1 .000) 

104 

rs2278444 

O.OOOE+00 

O.OOOE+00 

1.640E-37 

0.997 

(0.997 

, 0.997) 

1.000 

(1.000 , 

1 .000) 

105 

rsl2294259 

2.917E-12 

7.115E-12 

2.719E-22 

0.879 

(0.867 

, 0.891) 

1.000 

(1.000 , 

1 .000) 

106 

rs4148218 

5.875E-11 

7.889E-12 

6.413E-21 

0.826 

(0.820 

, 0.832) 

1.000 

(1.000 , 

1 .000) 

107 

rs7571647 

O.OOOE+00 

3.413E-11 

1.880E-43 

0.945 

(0.940 

, 0.950) 

1.000 

(1.000 , 

1 .000) 

108 

rsl7231506 

O.OOOE+00 

1.200E-04 

9.691E-33 

0.940 

(0.937 

, 0.942) 

1.000 

(1.000 , 

1 .000) 

109 

rs4738684 

9.671E-12 

3.481E-09 

4.619E-19 

0.918 

(0.915 

, 0.920) 

1.000 

(1.000 , 

1 .000) 

no 

rs9305020 

O.OOOE+00 

O.OOOE+00 

3.579E-167 

1.000 

(1.000 

, 1.000) 




111 

rsl7034539 

9.859E-14 

6.588E-13 

8.673E-25 

0.956 

(0.953 

, 0.958) 

1.000 

(1.000 , 

1 .000) 

112 

rs6016381 

2.220E-16 

7.925E-08 

1.264E-21 

0.990 

(0.990 

, 0.991) 

1.000 

(1.000 , 

1 .000) 

113 

rs4530754 

1.549E-09 

9.395E-07 

2.191E-14 

0.841 

(0.838 

, 0.844) 

1.000 

(1.000 , 

1 .000) 

114 

rsl6979372 

O.OOOE+00 

O.OOOE+00 

6.191E-52 

0.993 

(0.991 

, 0.995) 

1.000 

(1.000 , 

1 .000) 

115 

rs8176720 

1.760E-09 

1.180E-10 

2.397E-18 

0.838 

(0.835 

, 0.842) 

1.000 

(1.000 , 

1 .000) 

116 

rs4927207 

O.OOOE+00 

O.OOOE+00 

2.762E-45 

1.000 

(1.000 

, 1.000) 

1.000 

(1.000 , 

1 .000) 

117 

rs6511727 

7.586E-12 

1.719E-03 

2.560E-12 

0.897 

(0.894 

, 0.899) 

1.000 

(1.000 , 

1 .000) 

118 

rs2479408 

O.OOOE+00 

2.261E-04 

8.723E-27 

0.871 

(0.866 

, 0.877) 

1.000 

(1.000 , 

1 .000) 
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119 

rs531819 

O.OOOE+00 

O.OOOE+00 

4.421E-147 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

120 

rsl3420469 

O.OOOE+00 

O.OOOE+00 

2.393E-60 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

121 

rs588245 

1.157E-08 

3.121E-04 

3.044E-11 

0.092 

(0.091 

0.094) 

1.000 

(1.000 , 1.000) 

122 

rs688386 

O.OOOE+00 

O.OOOE+00 

2.279E-44 

0.999 

(0.999 

0.999) 

1.000 

(1.000 , 1.000) 

123 

rs6878680 

1.855E-11 

O.OOOE+00 

2.364E-32 

0.908 

(0.905 

0.910) 

1.000 

(1.000 , 1.000) 

124 

rsll753995 

3.357E-08 

2.220E-16 

7.994E-22 

0.633 

(0.625 

0.641) 

1.000 

(1.000 , 1.000) 

125 

rsl2748152 

2.111E-09 

1.487E-06 

4.012E-14 

0.672 

(0.659 

0 .686) 

1.000 

(1.000 , 1.000) 

126 

rsl1244041 

1.683E-08 

5.281E-01 

2.932E-08 

0.000 

(0.000 

0 .000) 

1.000 

(1.000 , 1.000) 

127 

rsl864163 

O.OOOE+00 

9.250E-08 

2.704E-22 

0.973 

(0.972 

0.975) 

1.000 

(1.000 , 1.000) 

128 

rsl0495907 

2.192E-10 

1.576E-06 

5.824E-15 

0.810 

(0.803 

0.817) 

1.000 

(1.000 , 1.000) 

129 

rs4926670 

O.OOOE+00 

O.OOOE+00 

1.162E-45 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

130 

rs6729410 

2.488E-09 

7.772E-16 

2.359E-22 

0.660 

(0.655 

0.665) 

1.000 

(1.000 , 1.000) 

131 

rs2194562 

4.697E-10 

8.657E-06 

6.793E-14 

0.715 

(0.704 

0.727) 

1.000 

(1.000 , 1.000) 

132 

rs461473 

2.648E-08 

7.008E-05 

2.526E-11 

0.545 

(0.532 

0.558) 

1.000 

(1.000 , 1.000) 

133 

rsl57580 

O.OOOE+00 

O.OOOE+00 

8.087E-127 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

134 

rs9929488 

3.502E-08 

7.434E-08 

2.694E-14 

0.715 

(0.708 

0.723) 

1.000 

(1.000 , 1.000) 

135 

rs2524299 

6.356E-12 

6.704E-08 

5.892E-18 

0.841 

(0.835 

0.848) 

1.000 

(1.000 , 1.000) 

136 

rsl2677676 

O.OOOE+00 

3.671E-02 

7.490E-18 

0.109 

(0.107 

0 .112) 

1.000 

(1.000 , 1.000) 

137 

rs9293656 

O.OOOE+00 

O.OOOE+00 

5.423E-44 

0.999 

(0.999 

0.999) 

1.000 

(1.000 , 1.000) 

138 

rs7551981 

O.OOOE+00 

O.OOOE+00 

2.009E-35 

0.996 

(0.996 

0.997) 

1.000 

(1.000 , 1.000) 

139 

rsll206551 

6.373E-14 

1.963E-11 

1.917E-23 

0.969 

(0.968 

0.971) 

1.000 

(1.000 , 1.000) 

140 

rsl7800760 

9.193E-14 

2.688E-10 

3.694E-22 

0.938 

(0.935 

0.942) 

1.000 

(1.000 , 1.000) 

141 

rs253412 

O.OOOE+00 

O.OOOE+00 

5.339E-42 

0.996 

(0.996 

0.997) 

1.000 

(1.000 , 1.000) 

142 

rs2075650 

O.OOOE+00 

O.OOOE+00 

1.835E-226 

1.000 

(1.000 

1 .000) 



143 

rsl7646665 

1.437E-11 

1.099E-14 

2.085E-24 

0.870 

(0.856 

0.884) 

1.000 

(1.000 , 1.000) 

144 

rs8103315 

O.OOOE+00 

1.143E-07 

1.249E-22 

0.920 

(0.915 

0.925) 

1.000 

(1.000 , 1.000) 

145 

rs217386 

1.926E-11 

4.911E-12 

1.264E-21 

0.880 

(0.877 

0.883) 

1.000 

(1.000 , 1.000) 

146 

rsll75544 

1.399E-09 

2.120E-05 

5.799E-13 

0.794 

(0.789 

0.798) 

1.000 

(1.000 , 1.000) 

147 

rs4240624 

1.084E-13 

8.327E-15 

1.458E-26 

0.906 

(0.899 

0.913) 

1.000 

(1.000 , 1.000) 

148 

rs6547409 

O.OOOE+00 

O.OOOE+00 

4.256E-45 

0.923 

(0.913 

0.932) 

1.000 

(1.000 , 1.000) 

149 

rs6859 

O.OOOE+00 

O.OOOE+00 

1.072E-101 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

150 

rs4704200 

O.OOOE+00 

O.OOOE+00 

7.507E-68 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

151 

rs2297374 

6.541E-13 

5.434E-07 

1.203E-17 

0.950 

(0.948 

0.952) 

1.000 

(1.000 , 1.000) 

152 

rs2000999 

O.OOOE+00 

O.OOOE+00 

2.466E-45 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

153 

rs754524 

O.OOOE+00 

O.OOOE+00 

2.321E-116 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

154 

rs8104483 

2.220E-16 

O.OOOE+00 

8.050E-35 

0.992 

(0.991 

0.992) 

1.000 

(1.000 , 1.000) 

155 

rs8106664 

O.OOOE+00 

2.043E-14 

1.025E-30 

0.995 

(0.994 

0.995) 

1.000 

(1.000 , 1.000) 

156 

rs413380 

1.637E-09 

6.317E-11 

1.554E-18 

0.448 

(0.423 

0.474) 

1.000 

(1.000 , 1.000) 

157 

rsl7397667 

O.OOOE+00 

O.OOOE+00 

4.128E-57 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 

158 

rsl2720804 

2.328E-10 

4.194E-06 

9.497E-15 

0.002 

(0.002 

0.003) 

0.982 

(0.976 , 0.988) 

159 

rslll24924 

2.229E-09 

2.556E-08 

6.302E-16 

0.681 

(0.665 

0.697) 

1.000 

(1.000 , 1.000) 

160 

rs714948 

6.974E-10 

4.339E-06 

2.890E-14 

0.397 

(0.387 

0.408) 

1.000 

(1.000 , 1.000) 

161 

rsll206510 

O.OOOE+00 

O.OOOE+00 

4.993E-62 

1.000 

(1.000 

1 .000) 

1.000 

(1.000 , 1.000) 
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